Resource-limited Index Construction for Large Texts
نویسنده
چکیده
An inverted index stores, for each term that appears in a collection of documents, a list of document numbers containing that term. Such an index is indispensable when Boolean or informal ranked queries are to be answered. Construction of the index is, however, a non-trivial task. Simple methods using in-memory data structures cannot be used for large collections because they require too much random access storage, and traditional disk-based methods require large amounts of temporary le space. In this paper a new method is described, making use of two simple compression codes for the positive integers, and an in-place merging structure. Analysis shows that the new technique is capable of inverting a 5 Gb collection in approximately 13 hours on a typical workstation, using 40 Mb of main memory and 120 Mb of additional temporary disk space.
منابع مشابه
The resource-constraint project scheduling problem of the project subcontractors in a cooperative environment: Highway construction case study
Large-scale projects often have several activities which are performed by subcontractors with limited multi-resources. Project scheduling with limited resources is one of the most famous problems in the research operations and optimization cases. The resource-constraint project scheduling problem (RCPSP) is a NP-hard problem in which the activities of a project must be scheduled to reduce the p...
متن کاملOnline Self-Indexed Grammar Compression
Although several grammar-based self-indexes have been proposed thus far, their applicability is limited to offline settings where whole input texts are prepared, thus requiring to rebuild index structures for given additional inputs, which is often the case in the big data era. In this paper, we present the first online self-indexed grammar compression named OESP-index that can gradually build ...
متن کاملConcurrent control on resource planning and revenue/expenditure estimation in large-scale shell material embankment projects management using discrete-event simulation
Resource planning in large-scale construction projects has been a complicated management issue requiring mechanisms to facilitate decision making for managers. In the present study, a computer-aided simulation model is developed based on concurrent control of resources and revenue/expenditure. The proposed method responds to the demand of resource management and scheduling in shell material emb...
متن کاملA Multi-Mode Resource-Constrained Optimization of Time-Cost Trade-off Problems in Project Scheduling Using a Genetic Algorithm
In this paper, we present a genetic algorithm (GA) for optimization of a multi-mode resource constrained time cost trade off (MRCTCT) problem. The proposed GA, each activity has several operational modes and each mode identifies a possible executive time and cost of the activity. Beyond earlier studies on time-cost trade-off problem, in MRCTCT problem, resource requirements of each execution mo...
متن کاملGenetic Algorithms for Optimization of Resource Allocation in Large Scale Construction Project Management
It is well known that a construction project is the process of resource consumption. Especially for large project, more kinds of resources are involved and the amount is very huge. In construction process of a project, the resource is limited and the time is very urgent, so for large scale project management there are some important subjects such as how to effectively distribute resources betwe...
متن کامل